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CHARACTERIZATION OF THE YEAST TRANSCRIPTOME 



TECHNICAI. VTEI.P OF THE INVENTION 

This invention is related to the characterization of the expressed genes 
of the yeast genome. More particularly, it is related to the identification and 
use of previously unrecognized genes. 

PACKGI^QVNP OF TTEIE INVPrmQN 

It is by now axiomatic that the phenotype of an organism is largely 
determined by the genes expressed within it. These expressed genes can be 
represented by a *^ranscriptome", conveying the identity of each expressed 
gene and its level of expression for a defined population of cells. Unlike the 
genome, vdnch is essentially a static entity, the transcriptome can be 
modulated by both external and internal factors. The transcriptome thereby 
serves as a dynamic link between an organism's genome and its physical 
characteristics. 

The transcriptome as defined above has not been characterized in any 
eukaryotic or prokaiyotic organism, largely because of technological 
limitations. However, some general features of gene expression patterns 
were elucidated two decades ago through KNA-DNA hybridization 
measurements (Bishop et al., 1974; Hereford and Rosbash, 1977). In many 
organisms, it was thus found that at least three classes of transcripts could be 
identified, with either high, medium, or low levels of expression, and the 
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numb^ of transcripts per cell were estimated (Lewin, 1980). These data of 
course provided Kttle information about the specific genes that were members 
of each class. Data on the expression levels of individual genes have 
accumulated as new genes were discovered. However, in only a few 
5 instances have the absolute levels of expression of particular genes been 

measured and compared to other genes in the same cell type. 

Description of any cell's transcriptome would therefore provide new 
information useful for understanding numerous aspects of cell biology and 
biochemistry. 

10 STTMMARY OF THE TNYEHUQM 

It is an object of the present invention to provide genes which are 
involvicd in cell cycle progression. 

It is another object of the present invention to provide methods of using 
the genes to affect the cell cycle. 
15 It is an object of the present invention to provide methods for screening 

candidate antifungal drugs. 

Another object of the invention is to provide a method for obtaining 
human homologs of the yeast genes which are involved in cell cycle 
progression. 

20 Another object of the invention is to provide probes for ascertaining 

phase in the cell cycle of a cell. 

These and other objects of the invention are achieved by providing the 
art with one or more of the embodiments described below. According to one 
embodiment of the invention an isolated DNA molecule is provided. It 
25 comprises a yeast gene which is involved in cell cycle progression selected 

from the group of NORF genes identified in Table 3 or 4. 

According to another embodiment of the invention a method of using 
yeast genes is pro>aded. The method is for affecting the cell cycle of a cell. 
The method comprises the step of: 
30 administering to a cell an isolated DNA molecule comprising a 
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yeast gene which is involved in cell cycle progression selected from the 
differentially expressed genes identified in Tables 1, 2, 3 and 4. 

In yet another embodiment of the invention a method for screening 
candidate antifungal dmgs is provided. The method comprises the steps of: 

contacting a test substance with a yeast cell; 

monitoring ^ression of a yeast gene which is involved in cell 
cycle progresaon selected from the group of yeast genes identified in Tables 
1, 2, 3 and 4, wherein a test substance which modifies the expression of the 
yeast gene is a candidate antifiingal drug. 

In still another embodiment of the invention a method for identifying 
human genes which are involved in cell cycle progression is provided. The 
method comprises the step of: 

hybridi^g a probe comprising at least 14 contiguous 
nucleotides of a yeast gene which is differentially expressed between at least 
two phases selected from the group consisting of log phase, S phase, and 
G2/M phase, wherein the yeast gene is identified in Table 1, 2, 3, or 4. 

Also provided by the present invention are isolated DNA molecules, 
which comprise probes for ascertaining phase in the cell cycle of a cell, 
wherein the probe comprises at least 14 contiguous nucleotides of a NORF 
gene as identified in Table 3 or 4. 

These and other embodiments of the invention which will be apparent 
to those of skill in the art upon reading the detailed disclosure provided 
below, make available to the art hitherto unrecognized genes, and information 
about the expression of genes globally at the organismal level. We provide 
the first description of a transcriptome, determined in S. cerevisiae cells. This 
organism was chosen because it is widely used to clarify the biochemical and 
physiologic parameters underlying eukaryotic cellular functions and because 
it is the only eukaryote in which the entire genome has been defined at the 
nucleotide level (Goffeau, et al., 1996). 
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ffpTTTF T^FsrRTPrrON OF THE DRAWI NGS 

Figure 1 . Schematic of SAGE Method and Genome Analysis. 

In applying SAGE to the analysis of yeast gene expression patterns, the 3* 

most Nlam site was used to define a unique position in each transcript and 

5 to provide a site for ligation of a linker with a BsmFI site. The type Hs 

enzyme BsmFI, which cleaves a defined distance firom its non-palindromic 
recognition site, was then used to generate a 15bp SAGE tag (designated by 
the black arrows), which includes the Nlain site. Automated sequencing of 
concatenated SAGE tags allowed the routine identification of about a 

10 thousand tags per sequencing gel. Once sequenced, the abundance of each 

SAC5E tag was calculated, and each tag was used to search the entire yeast 
genome to identify its corresponding gene. The lower panel shows a small 
region of Chromosome 15. Gray arrows indicate all potential SAGE tags 
(Nlam ates) and black arrows indicate 3* most SAGE tags. The total number 

15 of tags obsaved for eadi potential tag is indicated above (+ strand) or below 

(- strand) the tag. As expected, the observed SAGE tags were associated 
with the 3' end of expressed genes. 

Figure 2. Sampling of Yeast Gene Expression. 

Analysis of increasing amounts of ascertdned tags reveals a plateau in the 
20 number of unique expressed genes. Triangles represent genes with known 

functions, squares represent genes predicted on the basis of sequence 
information, and circles represent total genes. 

Figures. Virtual Rot. 

(a) Abundance Classes in the Yeast Transcriptome. The transcript abundance 
25 is plotted in reverse order on the abscissa, whereas the fi^action of total 

transcripts with at least that abundance is plotted on the ordinate. The dotted 
lines identify the three components of the curve, 1, 2, and 3. This is 
analogous to a Rot curve derived fi-om reassociation kinetics where the 
product of initial RNA concentration and time is plotted on the abscissa, and 

4 
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the percent of labeled cDNA that hybridizes to excess mRNA is plotted on 
the ordinate. 

(b) Comparison of Virtual Rot and Rot Components. Transitions and data 
from virtual Rot components were calculated from the data in Figure 3A, 
5 while data for Rot components were obtained from Hereford and Rosbash, 

1977. 

Figure 4. Chromosomal Expression Map for S. cerevisiae. Individual yeast 
genes were positioned on each chromosome according to their open reading 
frame (ORF) start coordinates. Abundance levels of tags corresponding to 
10 each gene are displayed on the vertical axis, with transcription from the + 

strand indicated above the absdssa and that from the - strand indicated below. 
Yellow bands at ends of the expanded chromosome represent telomeric 
regions that are undertranscribed (see text for details). 

Figure 5. Northern Blot Analysis of Representative Genes. TDH2/3, 
15 TEFiy2 and NORFl, are expressed relatively equally in all three states (lane 

1, G2/M arrested; lane 2, S phase arrested; lane 3, log phase), while RNR4, 
RNR2 , and NORF5 are highly expressed in S-phase arrested cells. The 
expression level observed by SAGE (number of tags) is noted below each 
lane and was highly correlated with quantitation of the Northern blot by 
20 Phosphorlmager analysis (rM).97). 
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Table Legends 

Table 1. Highly Expressed Genes 

Tag represents the 10 bp SAGE tag adjacent to the Nlalll site; Gene 
represrats the gene or genes corresponding to a particular tag (multiple genes 
5 that match unique tags are from related families, with an average identity of 

93%); Locus and Description denote the locus name, and functional 
description of each ORF, respectively; Copies/cell represents the abundance 
of each transcript in the SAGE library, assuming 15,000 total transcripts per 
cell and 60,633 ascertained transcripts. 

10 Table 2. Expression of Putative Coding Sequences 

Table columns are the same as for Table 1* 

Table 3. Expression of NORF genes 

SAGE Tag, Locus, and Ck>pies/cell are the same as for Table 1 ; Chr and Tag 
Pos daiote the chromosome and position of each tag; ORF Size denotes the 
size of the ORF corresponding to the indicated tag. In each case, the tag was 
located within or less than 250 bp 3' of the NORF. 

TlTTTATriTTI nVSCRTPTION 

It is a discovery of the present invention that certain hitherto unknown 
gaies (theNORFs) exist and are expressed in yeast. These genes, as well as 
other previously identified and previously postulated genes, can be used to 
study, monitor, and aflFect phase of cell cyde. The present invention provides 
information on whidi genes are differentially expressed during the cell cycle. 
Differentially expressed genes can be used as markers of phases of the cell 
cycle. They can also be used to affect a change in the phase of the cell cycle. 
In addition, they can be used to screen for drugs which affect the cell cycle, 
by affecting expression of the genes. Human homologs of these eukaryotic 
genes are also presumed to exist, and can be identified using the yeast genes 
as probes or primers to identify the human homologs. 

6 
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New genes termed NORFs (not previously assigned open reading 
frames) have been found. They are uniquely identified by their SAGE tags. 
In addition their entire nucleotide sequence is known and publicly avmlable. 
In general, these were not previously identified as genes due to their small 
5 size. However, they have now been found to be expressed. 

Differentially expressed yeast genes are those whose expression varies 
by a statistically significant difference (to greater than 95% confidence level) 
within different growth phases, particularly log phase, S phase, and G2/M. 
Preferably the difference is greater than 10%, 25%, 50%, or 100%. The 
10 genes which have been found to have such differential expression 

characteristics are: NORF N« 1, 2, 4, 5, 6, 17, 25, 27, TEF1/TEF2, EN02, 
ADHl, ADH2, PGKl, CUPIA/CUPIB, PYKl, YKL056C, YMR116C, 
YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and 
YJROSSC 

15 The DNA molecules according to the invention can be genomic or 

cDNA, Preferably they are isolated free of other cellular components such 
as membrane components, proteins, and lipids. They can be made by a cell 
and isolated, or synthesized using PCR or an automatic synthesizer. Any 
technique for obtaining a DNA of known sequence may be used. Methods 

20 for purifying and isolating DNA are routine and are known in the art. 

To administer yeast genes to cells, any DNA delivery techniques known 
in the art may be used, without limitation. These include liposomes, 
transfection, transduction, transformation, viral infection, electroporation. 
Vectors for particular purposes and characteristics can be selected by the 

25 skilled artisan for their known properties. Cells which can be used as gene 

redpiOTts are yeast and other fungi, mammalian cells, including humans, and 
bacterial cells. 

Antifiangal dmgs canbe identified using yeast cells as described herein. 
E?q)ression of a diflFerentially expressed gene can be monitored by any means 
30 known in the art. When a test substance affects the expression of such a 

differentially expressed gene, it is a candidate drug for affecting the growth 
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properties of fungi, and may be useful as an antifungal agent. 

Because dififerentially expressed genes are likely to be involved in cell 
cycle progression, it is likely that these genes are conserved among species. 
The differentially e5q>ressed genes identified by the present invention can be 
used to identify homologs in humans and other mammals. Means for 
identifying homologous genes among different species are well known in the 
art. Briefly, stringency of hybridization can be reduced so that imperfectly 
matching sequences hybridize. This can be in the context of inter alia 
Southern blots. Northern blots, colony hybridization or PCK Any 
hybridization technique which is known in the art can be used. 

Probes according to the present invention are isolated DNA molecules 
which have at least 10, and preferably at least 12, 14, 16, 18, 20, or 25 
contiguous nucleotides of a particular NORF gene or other differentially 
expressed gene. The probes may or may not be labeled. They may be used 
as prim^ for PGR or for Southern or Northern blots. Preferably the probes 
are anchored to a solid support More preferably they are present on an array 
so that multiple probes can simultaneously hybridize to a single biolo^cal 
sample. The probes can be spotted onto the array or synthesized in situ on 
the array. See Lockhart et. al.. Nature Biotechnology, Vol. 14, December 
1996, '^Expression monitoring by hybridization to high-density 
oligonucleotide arrays." A single array can contmn more than 100, 500 or 
even 1,000 different probes in discrete locations. 

The above disdosure generally describes the present invention. A more 
complete imderstanding can be obtained by reference to the following specific 
examples whidi are provided herein for purposes of illustration only, and are 
not intended to limit the scope of the invention. 

RXAMPLE 

Summary 

We have analyzed the set of genes expressed from the yeast genome, herein 
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called the transcriptome, using serial analysis of gene expression (SAGE), 
Analysis of 60,633 transcripts revealed 4,665 genes, with expression levels 
ran^g from 0.3 to over 200 transcripts per cell. Of these genes, 1,981 had 
known fiinctions, while 2,684 were previously uncharacterized. Integration 
5 of positional information with gene expression data allowed the generation 

of chromosomal expression maps, identifying physical re^ons of 
transcriptional activity, and identified genes that had not been predicted by 
sequence information alone. These studies provide insight into global 
patterns of gene expression in yeast and demonstrate the feasibility of 
10 genome-wide expression studies in eukaryotes. 

Results 

Characteristics and Rationale of SAGE Approach 
Several methods have recentiy been described for the high throughput 
evaluation of gene expression (Nguyen et al,, 1995; Schena et al., 1995; 
Velculescu et al., 1995). We used SAGE (Serial Analysis of Gene 
Expresdon) because it can provide quantitative gene expression data without 
the prerequisite of a hybridization probe for each transcript. The SAGE 
technology is based on two basic principles (Figure 1). First, a short 
sequrace tag (9-1 1 bp) contains suflBcient information to uniquely identify a 
transcript, provided that it is derived firom a defined location within that 
transcript. Second, many transcript tags can be concatenated into a single 
molecule and then sequenced, revealing the identity of multiple tags 
simultaneously. The expression pattern of any population of transcripts can 
be quantitatively evaluated by determining the abundance of individual tags 
and identifying the gene corresponding to each tag. 

Genome-wide expression 

In order to maximize representation of genes involved in normal growth and 
cell-cycle progression, SAGE libraries were generated firom yeast cells in 
three states: log phase, S phase arrested and G2/M phase arrested. In total, 
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SAGE tags corresponding to 60,633 total transcripts were identified 
(including 20,184 &om log phase, 20,034 firom S phase arrested, and 20,415 
from G2/M phase arrested ceUs). Of these tags, 56,291 tags (93%) precisely 
matched the yeast genome, 88 tags matched the mitochondrial genome, and 
91 tags matched the 2 micron plasmid. 

The number of SAGE tags required to define a yeast transcriptome 
deprads on the confidence level desired for detecting low abundance niRNA 
molecules. Assuming the previously derived estimate of 15,000 mRNA 
molecules per ceU (Hereford and Rosbash, 1977), 20,000 tags would 
represent a 1.3 fold coverage even for mRNA molecules present at a single 
copy per cell, and would provide a 72% probability of detecting such 
transCTipts (as determined by Monte Carlo simulations). Analysis of 20,184 
tags from log phase cells identified 3,298 unique genes. As an independent 
confirmation of mRNA copy number per cell, we compared the expression 
level of SUP44/RPS4, one of the few genes whose absolute mRNA levels 
have been reliably detormined by quantitative hybridization experiments (Iyer 
and Struhl, 1996), with expression levels determined by SAGE. 
SUP44/RPS4 was measured by hybridization at 75 +/- 10 copies/cell (Iyer 
and Struhl, 1996), in good accord with the SAGE data of 63 copies/cell, 
suggesting that the estimate of 15,000 mRNA molecules per cell was 
reasormbly accurate. Analysis of SAGE tags from S phase arrested and G2/M 
phase arrested cells revealed similar expression levels for this gene (range 52 
to 55 copies/cell), as well as fi)r the vast majority of expressed genes. As less 
than 1% of the genes were expressed at dramatically different levels among 
these three states (see below), SAGE tags obtained from all libraries were 
combined and used to analyze global patterns of gene expression. 

Aiudyas of ascertained tags at increasing increments revealed that the 
number of unique transcripts plateaued at -60,000 tags (Figure 2). This 
suggested that generation of further SAGE tags would yield few additional 
genes, consistent with the feet that sixty thousand transcripts represented a 
four-fold redundancy for genes expressed as low as 1 transcript per cell. 

10 



Likewise, Monte Carlo simulations indicated that analysis of 60,000 tags 
would identify at least one tag for a given transcript 97% of the time if its 
expression level was one copy per cell. 

The 56,291 tags that precisely matched the yeast genome represented 
4,665 different genes. This number is in agreement with the estimate of 
3,000 to 4,000 expressed genes obtained by RNA-DNA reassociation kinetics 
(Hereford and Rosbash, 1977). These expressed genes included 85% of the 
genes with characterized functions (1,981 of 2,340), and 76% of the total 
genes predicted from analysis of the yeast genome (4,665 of 6,121). These 
numbers are consistent with a relatively complete sampUng of the yeast 
transoiptome given the limited number of physiolo^cal states examined and 
the large number of genes predicted solely on the basis of genomic sequence 
analysis. 

The transcript expression per gene was observed to vary from 0.3 to 
over 200 copies per cell. Analysis of the distribution of gene expression 
levels revealed several abundance classes that were similar to those observed 
in previous studies using reassociation kinetics. A "virtual Rot" of the genes 
observed by SAGE (Figure 3A) identified three main components of the 
transcriptome with abundances ranging over three orders of magnitude. A 
Rot curve derived from KNA-cDNA reassociation kinetics also contained 
three main components distributed over a similar range of abundances 
(Hereford and Rosbash, 1977). Although the kinetics of reassociation of a 
particular class of RNA and cDNA may be affected by numerous 
experimental variables, there were striking similarities between Rot and 
virtual Rot analyses (Figure 3B). Because Rot analysis may not detect all 
transcripts of low abundance (Lewin, 1980), it is not surprising that SAGE 
revealed both a larger total number of expressed genes and a higher fraction 
of the transcriptome belonging to the low abundance transcript class. 

Integration of Expression Information with the Genomic Map 

The SAGE expression data could be integrated with existing positional 
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information to generate chromosomal e;q)ression maps (Figure 4). These 
maps were generated uang the sequence of the yeast genome and the position 
coordinates of ORFs obtained from the Stanford Yeast Genome Database. 
Although there were a few genes that were noted to be physically proximal 

5 and have similarly high levels of expression, there did not appear to be any 

clusters of particularly high or low expression on any chromosome. Genes 
like histones H3 and H4, which are known to have coregulated divergent 
promoters and are immediately adjacent on chromosome 14 (Smith and 
Murray, 1983), had very sunilar expression levels (5 and 6 copies per cell, 

10 respectively). The distribution of transcripts among the chromosomes 

suggested that overall transoiption was evraily diqjersed, with total transcript 
levels being roughly lineariy related to chromosome size (r* =0.85, data not 
shown). However, re^ons within 10 kb of telomeres appeared to be 
uniformly undertranscribed, contauning on average 3.2 tags per gene as 

1 5 compared with 12.4 tags pei gene for non-telomeric regions (Figure 4). This 

is conastent with the previously described observations of "telomeric 
silencing" in yeast (Gottschling et al., 1990). Recent studies have reported 
telommc position effects as far as 4 kb from telomere ends (Renauld et al., 
1993). 

20 Gene Expression Patterns 

Table 1 lists the 30 most highly expressed genes, all of which are expressed 
at greater than 60 mKNA copies per cell. As expected, these genes mostiy 
coirespond to well characterized enzymes involved m energy metabolism and 
protein synthesis and were expressed at similar levels m all three growth 

25 states (Examples in Figure 5). Some of these genes, including EN02 

(McAIister and Holland, 1982), PDCl (Schmitt et al., 1983), PGKJ 
(Chambers et al., 1989), PYKI (Nishizawa et al., 1989), and ADHl (Denis et 
al., 1983), are known to be dramatically induced in the glucose-rich growth 
conditions used in this study. In contrast, glucose repressible genes such as 

30 tiie GALI/GAL7/GAL10 chister (St John and Davis, 1979), and GALS (Bajwa 
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et al., 1988) were observed to be expressed at very low levels (0.3 or fewer 
copies per cell). As expected for the yeast strain used in this study, mating 
type a specific genes, such as the a factor genes (MFAJ, MFA2) (Michaelis 
and Herskowitz, 1988), and alpha factor receptor (STE2) (Burkholder and 
Hartwell, 1985) were all observed to be expressed at significant levels (range 
2 to 10 copies per cell), while mating type alpha specific genes (MFal, 
MFca, STE3) (Hagen et al., 1986; Kuijan and Herskowitz, 1982; Singh et al., 
1983) were observed to be expressed at very low levels (<0.3 copies/cell). 

Three of the highly expressed genes in Table 1 had not been previously 
characterized. One contmned an ORF with predicted ribosomal fijnction, 
previously identified only by genomic sequence analysis. Analyses of all 
SAGE data suggested that there were 2,684 such genes corresponding to 
uncharacterized OKFs which were transcribed at detectable levels. The 30 
most abundant of these transcripts were observed more than 30 times, 
corresponding to at least 8 transcripts per cell (Table 2). The other two 
highly expressed uncharact^ed genes corresponded to ORFs not predicted 
by analysis of the yeast genome sequence (NOKF = Honannotated ORF) . 
Analyses of SAGE data suggested that there were approximately 160 NORF 
genes transcribed at detectable levels. The 30 most abundant of these 
transaipts were observed at least 9 times (Table 3 and examples in Figure 5). 

Interestingly, one of the M>i?F genes {NORFS) was only expressed in 
S phase arrested cells and corresponded to the transcript whose abundance 
varied the most in the three states analyzed (> 49 fold. Figure 5). 
Comparison of S phase arrested cells to the other states also identified greater 
than 9 fold elevation of the RNR2 and RNR4 transcripts (Figure 5). Induction 
of these ribonucleo^de reductase genes is likely to be due to the hydroxyurea 
treatment used to arrest cells in S phase (EUedge and Davis, 1989). 
Likewise, comparison of G2/M arrested cells identified elevation of RBL2 
and d5niein light chain, both microtubule associated proteins (Archer et al., 
1995; Dick et al., 1996). As with the RNR inductions, these elevated levels 
seem likely to be related to the nocodazole treatment used to arrest cells in 
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the G2/M phase. While there were many relatively small differences between 
the states (for example, NORFl, Figure 5), overall comparison of the three 
states revealed surprisingly few dramatic diflferences; there were only 29 
transcripts whose abtmdance varied more than 10 fold among the three 
different states analyzed. 

Discussion 

Analysis of a yeast transcriptome affords a unique view of the RN A 
components defining cellular life. We observed gene expression levels to vary 
over three orders of magnitude, with the transcripts involved in energy 
metabolism and protein synthesis the most highly expressed. Key transcripts, 
sudi as those encoding enzymes required for DNA replication (e.g. POLl and 
POLS), kinetochore proteins {NDCJO and SKP1\ and many other interestmg 
proteins, were present at 1 or fewer copies per cell on average. These 
abundances are consistent with previous qualitative data from reassodation 
kinetics which suggested that the largest number of expressed genes was 
present at 1 or 2 copies per cell. These observations indicate that low 
transcript copy numbers are sufficient for gene expression in yeast, and 
suggest that yeast possess a mechanism for rigid control of KNA abundance. 
The synthesis of chromosomal expression maps presents a cataloging 
of the expression level of genes, organized by their genomic positions. It is 
not surprising that gene expression is well balanced throughout the 16 
chromosomes of 51 cerevisiae. As most genes have independent regulatory 
dements, it would have been surprising to find a large number of physically 
adjacent genes that had similar high levels of expression. Of the few genes 
that were known to have coregulated divergent promoters, like the H3/H4 
pair, SAGE data confirmed concordant levels of expression. For areas like 
telomere ends that are known to be transcriptionally suppressed, SAGE data 
corroborated low levels of expression. Other expected expression patterns 
such as high levels of glucose induced glycolytic enzymes, low levels of 
glucose repressed GAL genes, expression of mating type a specific genes, and 
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low of expression of mating type alpha genes, were observed. Finally, 
identification of tags corresponding to NORF genes suggests that there is a 
significant number of small proteins encoded by the yeast genome that were 
undetected by the criteria used for systematic sequence analysis. The yeast 
genome sequence has been annotated for all ORFS larger than 300bp, 
(encoding proteins 100 amino acids or greater). Genes encoding protems 
below this cut oflf are therefore commonly unannotated. This class of genes 
might also be underrepresented in mutational collections because of the small 
target size for mutagenesis, and ^ven their small size, may encode proteins 
with novel fiinctions. The systematic knockout of these NORF genes will 
therefore be of great mterest. 

Comparison of gme wpression patterns fi-om altered physiologic states 
can provide insight into genes that are important in a variety of processes. 
Comparison of transcriptomes fi-om a variety of physiologic states should 
provide a minimum set of genes whose expression is required for normal 
v^etadve growth, and another set composed of genes that will be expressed 
only in response to specific environmental stimuli, or during specialized 
processes. For example, recent work has defined a minimal set of 250 genes 
required for prokaryotic cellular life (Mushegian and Koonin, 1996). 
Examination of the yeast genome readily identified homologous genes for 196 
of these, over 90% of which were observed to be expressed in the SAGE 
analysis. Detailed analyses of yeast transcriptomes, as well as transcriptomes 
fix)m other organisms, should ultimately allow the generation of a minimal set 
of genes required for eukaryotic life. 

Like other genome-wide analyses, SAGE analysis of yeast 
transcriptomes has several potential limitations. First, a small number of 
transcripts would be expected to lack an Nlain site and therefore would not 
be detected by our analysis. Second, our analysis was limited to transcripts 
found at least as firequently as 0.3 copies per cell. Transcripts expressed in 
only a minute fi-action of the cell cycle, or transcripts expressed in only a 
fi-action of the cell population, would not be reliably detected by our analysis. 



15 



Finally, mKNA sequrace data are practically unavmlable for yeast, and 
consequently, some SAGE tags cannot be unambiguously matched to 
corresponding genes. Tags which were derived from overlapping genes, or 
genes vHadti have unusually long 3' untranslated regions may be misassigned. 
Increased availability of 3' UTR sequences in yeast mKNA molecules should 
help to resolve the ambiguities. 

Despite these potential limitations, it is clear that the analyses described 
here furnish both global and local pictures of gene expression, precisely 
defined at the nucleotide level. These data, Uke the sequence of the yeast 
genome itself provide simple, baac information integral to the interpretation 
of many expraiments in the fiiture. The availability of mRNA sequence 
information from EST sequendng as well as various genome projects, will 
soon allow definition of transcriptomes from a variety of organisms, including 
human. The data recorded here suggest that a reasonably complete picture 
of a human cell transcriptome will require only about 10 - 20 fold more tags 
than evaluated here, a number well Avithin the practical realm achievable vwth 
a small number of automated sequencers. The analysis of global expression 
patterns in higher eukaryotes is expected, in general, to be similar to those 
reported here for S. cerevisiae. However, the analysis of the transcriptome 
in different cells and from different individuals should yield a wealth of 
information regarding gene function in normal, developmental, and disease 
states. 

Experimental Procedures 
Yeast cdl culture 

Hie source of transcripts for all experiments was S. cere\asiae strain YPH499 
{MATa ura3-52 fys2-801 ade2-101 Ieu2-Al his3-A200 trpl-A63) (Sikorski 
and Heter, 1989). Logarithmically growing cells were obtained by growing 
yeast ceUs to early log phase (3 x 10« cells/ml) in YPD (Rose et al., 1990) 
rich medium (YPD supplemented with 6mM uracil, 4.8 mM adenine and 24 
mM tryptophan) at 30°C. For arrest in the Gl/S phase of the cell cycle, 
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hydroxyurea (O.IM) was added to early log phase cells, and the culture was 
incubated an additional 3.5 hours at 30**C. For arrest in the G2/M phase of 
the cell cycle, nocodazole (15ug/ml) was added to early log phase cells and 
the culture was incubated for an additional 100 minutes at 30**C, Harvested 
5 cells were washed once with water prior to freezing at -70 °C. The growth 

states of the harvested cells were confirmed by microscopic and flow 
cytometric analyses (Basrai et al., 1996). 

RNA isolation and Northern Blot Analysis 

Total yeast RNA was prepared using the hot phenol method as described 
10 (Leeds et al,, 1991). mRNA was obtained using the MessageMaker Kit 

(Gibco/BKL) following the manufacturer's protocol. Northern blot analysis 
was performed as described (El-Deiry et al., 1993), using probes PCR 
amplified from yeast genomic DNA 

SAGE protocol 

The SAGE method was performed as previously described (Velculescu et al., 
1995), with exceptions noted below. PolyA RNA was converted to double- 
stranded cDNA with a BRL synthesis kit using the manufacturer's protocol 
except for the inclusion of primer biotin-5 -Tig-S'. The cDNA was cleaved 
with Nlain (Anchoring Enzyme). As NlalU sites were observed to occur 
once every 309 base pairs in three arbitrarily chosen yeast chromosomes (1, 
5, 10), 95% of yeast transcripts were predicted to be detectable with a NlalH- 
based SAGE approach. After capture of the 3' cDNA fragments on 
streptavidin coated magnetic beads (Dynal), the bound cDNA was divided 
into two pools, and one of the following linkers contmning recognition sites 
for BsmFI was ligated to each pool: Linker 1, 5'- 
TrTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3* 
(SED ID N0:l).5'- 

TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC 
[amino mod. C7]-3'(SED ID N0:2).; Linker 2,5'- 
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TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3' 
(SED ID NO: 3)5'- 

TCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAG[aminomod. C7]- 

3'(SEDIDNO:4). 

5 As BariFI (Tagging Enzyme) cleaves 14 bp away from its recognition 

site, and the NlalH site overlaps the Bsmn site by 1 bp, a 15 bp SAGE tag 
was released with BsmFI. SAGE tag overhangs were filled-in with Klenow, 
and tags from the two pools were combined and ligated to each other. The 
ligation product was dihited and then amplified with PGR for 28 cycles with 
0 5'-GGATTTGCTGGTGCAGTACA-3' (SED ID NO: 5) and 5'- 

CTGCTCGAATTCAAGCTTCT-3' (SED ID NO:6), as primers. The PGR 
product was analyzed by polyaaylamide gd electrophoresis (PAGE), and the 
PGR product containing two tags ligated tail to tMl (ditag) was excised. The 
PGR product was thai cleaved with NlalH, and the band containing the ditags 
15 was exdsed and sdf-ligated. After ligation, the concatenated products were 

separated by PAGE and products between 500 bp and 2 kb were excised. 
These products were cloned into the SphI site of pZero (Invitrogen). 
Colonies were screened for inserts by PGR with M13 forward and M13 
reverse sequaices located outside the cloning site as primers. 
20 PGR products from selected clones were sequenced with the TaqFS 

DyePrimer kits (Peridn Elmer) and analyzed using a 377 ABI automated 
sequencer (Perkin Ehner), following the manufacturer's protocol. Each 
successfiil sequendng reaction identified an average of 26 tags; given a 90% 
sequendng reaction success rate, this corresponded to an average of about 
25 850 tags per sequencing gel. 

SAGE data analysis 

Sequence files were analyzed by means of the SAGE program group 
(Velculescu et al., 1995), which identifies the anchoring enzyme site with the 
30 proper spacing and extracts the two intervening tags and records them in a 

database. The 68,691 tags obtained contained 62,965 tags from unique 
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ditags and 5,726 tags from repeated ditags. The latter were counted only 
once to eliminate potential PGR bias of the quantitation, as described 
(Velculescu et al,, 1995). Of 62,965 tags, 2,332 tags corresponded to linker 
sequences, and were excluded from further analysis. Of the remdning tags, 
4,342 tags could not be assigned, and were likely due to sequencing errors (in 
the tags or in the yeast genomic sequence). If all of these were due to tag 
sequendng OTors, this corresponds to a sequencing error rate of about 0.7% 
per base pair (for a lObp tag), not far from what we would have expected 
under our automated sequendng conditions. However, some unassigned tags 
had a much higher than expected frequency of A*s as the last five base pairs 
of the tag (5 of the 52 most abundant unassigned tags), suggesting that these 
tags w«-e derived from transcripts containing anchoring enzyme sites within 
several base pairs from their polyA tails. Given the frequency of Nlain sites 
in the genome (one in 309 base pairs), approximately 3% of transcripts were 
predicted to contain Nlain sites within 10 bp of their polyA tmls. 

As very sparse data are available for yeast rriRNA sequences and eflTorts 
to date have not been able to identify a highly conserved polyadenylation 
agnal (Imiger and Braus, 1994; Zaret and Sherman, 1982), we used 14 bp of 
SAGE tags (i.e. the Nlain site plus the adjacent 10 bp) to search the yeast 
genome directly ^east genome sequence obtained from the Stanford yeast 
genome ftp site (genome-flp.stanford.edu) on August 7, 1996). Because only 
coding regions are armotated in the yeast genome, and SAGE tags can be 
derived from 3' untranslated re^ons of genes, a SAGE tag was considered to 
correspond to a particular gene if it matched the ORF or the region 500 bp 
3' of the ORF (locus names, gene names and ORF chromosomal coordinates 
were obtained from Stanford yeast genome ftp site, and ORF descriptions 
were obtained from MIPS www site (http://www.mips.biochem. mpg.de/) on 
August 14, 1996). ORFs were considered genes with known functions if they 
were associated with a three letter gene name, while ORFs without such 
designations were considered uncharacterized. 

As expected, SAGE tags matched transcribed portions of the genome 
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in a highly non-random fashion, with 88% matching ORFs or their adjacent 
3* regions in the correct orientation (chi-squared P value <1 0'^^. In instances 
when more than one tag matched a particular ORF in the correct orientation, 
the abundance was calculated to be the sum of the matched tags (for Figure 

5 2, Figure 3, and Figure 4). Tags that matched ORFs in the mcorrect 

orientation were not used in abundance calculations. In instances when a tag 
matched more than one region of the genome (for example an ORF and non- 
ORF region) only the matched ORF was considered. In some cases the 15th 
base of the tag could also be used to resolve ambiguities. For Figure 4, only 

10 tags that matched the genome once were used. 

For the identification of NORF genes, only tags were considered that 
matched portions of the genome that were further than 500 bp 3" of a 
pre^dpusly identified ORF, and were observed at least two times in the SAGE 
libraries. 
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TABLE 4 
Additional NORFs 



GGCGCAAfff 4 1108395 2 

TAAGTGATGA 7 593382 2 

TTGTTGAATT 10 608373 2 

GAAGCAGTAA 3 155607 2 

ACATATGTTA 4 916112 2 

CCCTACACGG 6 223289 2 

GTAATTGGAC 10 392099 2 

ATCAGACAAA 14 687272 2 

TTATGAAAGA 15 81263 2 

ATTCGTTCTA 15 841970 2 

AGCAGGAGTT 16 188350 2 

TTCTATTAGG 2 418749 2 

TGGATTTCAG 4 1224930 2 

CAGATATAAT 5 52488 2 

CTGTTTTGGG 11 374761 2 

C AI I I I lA GT 11 508212 2 

TTGAAAAGAT 13 104160 2 

TAAGCCCATC 13 251273 2 

AGCGTCCTCA 15 832420 2 

TTTAGTTAAT 2 477623 2 

ATGGTAGCCA 3 56961 2 

AATTAGACTA 3 162589 2 

AGTGACTCrr 4 1490879 2 

GGACTATAAG 5 251266 2 

ACTTTTTCAG 10 159213 2 

GTCATATAGT 13 158765 2 

CAACAAAGTG 13 171166 2 

GTGGGAAAGG 13 804600 2 

TACTTTATAT 16 366449 2 

AATACCAGCG 3 175540 

GCCTTGTATA 4 372624 

GGTACATTCA 5 67152 

GATTTCTCTG 5 187462 

TAGTTGCTCC 7 317108 

GTAAGAAATC 7 836202 

CTTGGGCTAT 8 107992 

AAATGGTGAT 11 558686 

ATCATTTGGG 12 199358 

CTGAACTTTA 12 283720 

CCAGAAGGAG 13 652873 

CCGGTTACTA 15 803663 

CGATGAGAAG 15 1004369 

AAACCGTCCC 16 199141 

TCATTCATAC 2 164728 

TATC I I I I I G 4 169784 

TTAGAATAAT 4 603508 

GTACGCTGTG 5 118089 

TATATTAATT 6 64228 
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GTTCTTGCCT 7 939579 

ATATAGCTGC 10 181144 

CCAAAAAAAA H 91785 

GAACTCCACA 11 94125 

CCTTCACTGC 11 374172 

CACATCATAA 11 625896 

GAAGTATTGA 12 603999 

TGCGCGTATA 13 206410 

GGGTAGTACT 13 671730 

TAGTTTTGTC 15 33475 

CAATTCCTAC 1 172182 0.8 

TTTGATTTGA 2 46431 0.8 

GGCTCTGGTT 2 414510 0.8 

CAGAAATAGC 2 565130 0.8 

CTGTTATTTT 2 616054 0.8 

CGAAGTCAAA 2 680605 0.8 

CTCTAGATAA 3 171584 0.8 

AGTCAAAATG 4 192750 0.8 

GCGAGTTTAG 4 691301 0.8 

GCTCCAATAG 4 1131020 0.8 

TTTATTTGAG 4 1237501 0.8 

GTTATATTGA 4 1401803 0.8 

TGGGTTGAAG 5 251266 0.8 

ATTTTATTTG 5 447729 0.8 

ATCATAAAAA 5 548612 0.8 

TTATATAAAA 6 223182 0.8 

CTACTTCTGC 8 34653 0.8 

ATAAGACAGT 10 227802 0.8 

TTCATAAGTT 10 471894 0.8 

TAAATCTGAG 11 145617 0.8 

CTGGTAGAAA 11 151174 0.8 

CACGTACACA 11 403208 0.8 

CCAAGATCAA 11 425882 0.8 

AGCTTGTTCC 12 234966 0.8 

CACATTCGTT 12 759953 0-8 

CTTACATATA 12 789781 0.8 

TCTATAGCAA 13 228936 0.8 

CCTTTCTGAA 13 297985 0.8 

CCTTTAGAAT 13 777999 0.8 

AATTAACACC 13 842122 0.8 

GCGCAGGGGC 14 440984 0.8 

TGTTTATAAA 14 661710 0.8 

AAAAGTCATT 15 32081 0.8 

TTCGTAAACT 15 680625 0.8 

TTTTTGGAGT 15 888343 0.8 

AGGCATCTTG 16 250284 0.8 

AAATCAAAAC 16 453890 0.8 

AATTGACGAA 16 560169 0.8 

TTGATGATTT 16 582360 0.8 

CCTGTTTTTG 16 643476 0.8 

I 11 I l AAAAA 1 101436 0.5 



25 



BNSDOCID: <WO 9832847A2 I > 



wo 98/32847 



PCTAJS98/01216 



AAGTTTGATC 
AGCACCTATG 
TGATTTATCC 
ACTGCATCTG 
CAAGTTAGGA 
ATACCCAATT 
AACTTTGTAT 
GCGGCGGGTG 
AAAATTGTTC 
TCAAGTACTC 
AACTGTATGC 
CTATCGGCCA 
ACAAGCCCAA 
GTACAGGGCT 
AAGATCATCG 
GAACTCCTGG 
GAACGAGAAG 
I I I I lAATAC 
TCTCCAGTTG 
AATACGTTAC 
ACGATTGGCT 
TGTTTATAAG 
CGTTTTCGTC 
TCGAACCTCT 
TCCACACACA 
CCGTGCGTGC 
TTTCTTCAAC 
CCAAGTCTCG 
AGAGCGAATT 
TGTAGATTAT 
AAAAGTAGTT 
ACTTGGTATG 
TTAATGTTAT 
TACACGCGCG 
GGTCACTCCT 
AAGTGATGAA 
TTTATCTTGT 
AGTGATTGTT 
GCTTTGTTGT 
TCATTGATTC 
TTCACGGGAA 
ACTATTCTGT 
GGGCCAACCC 
AAAATATCTT 
TAGTAGTAAC 
AAGCGGACAA 
TCGCTGTTTT 
TGTA I I I I I G 
CTAAACAAAG 
TAGGAAGAAA 
GGAAAAATTA 



1 199848 0.5 

2 46913 0.5 
2 418946 0.5 
2 680860 0.5 

2 744770 0.5 

3 29939 0.5 
3 30056 0.5 
3 41645 0.5 
3 57108 0.5 
3 157855 6.5 
3 223882 0.5 
3 278840 0.5 

3 289917 0.5 

4 93873 0.5 
4 254851 0.5 
4 340891 0.5 
4 371850 0.5 
4 372058 0.5 
4 381712 0.5 
4 471791 0.5 
4 5091 58 0.5 
4 521709 0.5 
4 538839 0.5 
4 578702 0.5 
4 930972 0.5 

4 1324367 0.5 

5 116099 0.5 
5 159320 0.5 
5 207517 0.5 
5 280465 0.5 
5 286387 0.5 
5 422942 0.5 
5 544523 0.5 

5 544555 0.5 

6 62983 0.5 
6 76141 0.5 
6 130327 0.5 

6 256223 0.5 

7 72577 0.5 
7 110590 0.5 
7 323655 0.5 
7 423957 0.5 
7 433787 0.5 
7 559397 0.5 
7 622201 0.5 
7 735909 0.5 
7 800300 0.5 
7 836202 0.5 
7 836587 0.5 
7 905046 0.5 
7 958839 0.5 
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TTTGGATAGT 

CGTTTGTGTA 

AGAAAAAAAC 

TAAAGTCCAG 

TAAGCAGATT 

ATGAGCATTT 

AGGTGCAAAA 

TAACAAAGAG 

CAATTGGCAA 

ACTCCCTGTA 

CTCTATTGAT 

GCTTTCCTTT 

ACCGCAAAGA 

CTTGTTCAAA 

AATGTGCTGT 

GCAGATAGCG 

TCTGACTTAG 

CCCGGATGTT 

GTAACGATTG 

GAATAACGAA 

ACTGCTATTT 

GTTCTCTAGC 

CATCACCATC 

TTGCACTTCT 

ACTGTTTATG 

TTGCTATATA 

TACATTCTAA 

CTCTTAGTTG 

ACGAACACTT 

TGCGCAAGTC 

TTTTTCTTAA 

CAAATGCATT 

CAAATTGTGT 

GCAATACTAT 

AGTGACGATG 

TACTGGTTTA 

GTTTGACCTA 

AGCGTTTGAT 

CTCTGTTGCG 

AAATTCAAAA 

TTTGCTTGGT 

AGTTTTCCTG 

TTTAAAGATA 

AAGGAGACAC 

CTATATATCA 

GATGGAATAG 

TCGAGTCGAA 

AAAAAAGAAA 

TTTCCAGAAT 

TGGACAATGT 

GGAATTAAGA 



7 
8 
8 
8 
8 
9 
9 

10 

10 

11 

11 

11 

11 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 

12 
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13 

13 

13 

13 

13 

13 

14 

14 

14 

14 

14 

15 

15 

15 

15 

15 

15 

15 

15 

15 

15 

15 

15 



974754 
202655 
386651 
518998 
529129 
97114 
229077 
628227 
721781 
93528 
144281 
146665 
231872 
230972 
320426 
341324 
368780 
433912 
449917 
673851 
712476 
712712 
794710 
806833 
867350 
1017911 
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ACTATATGTT 16 582230 0.5 

GATATATCAT 16 589647 0.5 

AGAATTGATT 16 744406 0.5 

CACTGTCTCC 16 824649 0.5 
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An isolated DNA molecule comprising a yeast gene which is involved 
in cell cycle progression selected from the group of NORF genes 
identified in Tables 3 and 4. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 10% between any two phases of the 
cell cycle selected from the group consisting of: log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 25% between any two phases of the 
cell cycle sdected from the group consisting of log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 50% between any two phases of the 
cell cycle sdected from the group consisting of. log phase, S phase, and 
G2/M, 

The isolated DNA molecule of claim 1 wherein expression of 
theNOKF gene varies by at least 100% between any two phases of the 
cell cycle selected from the group condsting of: log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by a statistically significant difference (greater 
than 95% confidence level) between any two phases of the cell cycle 
selected from the group consisting of: log phase, S phase, and G2/M. 

The isolated DNA molecule of claim 6 wherein the NORF is 
selected from the group consisting of NORF N" 1, 2, 4, 5, 6, 17, 25, 
and 27. 

The isolated DNA molecule of claim 1 wherein the NORF gene 
is not repressed in at least one phase of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 
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9. The isolated DNA molecule of claim 1 which is genomic. 

10. The isolated DNA molecule of claim 1 which is cDNA. 

11. A method of using yeast genes to affect the cell cycle, 
comprising the step of: 

administering to a cell an isolated DNA molecule comprising a 
yeast gene which is involved in cell cycle progression selected from the 
diflFerentially expressed genes identified in Tables 1, 2, 3, and 4. 

12. The method of claim 1 1 wherein the cell is a yeast cell, 

13. The method of claim 1 1 wherein the cell is a fungal cell. 

14. The method of claim 11 wherein the cell is a mammalian cell. 

15. The method of claim 11 wherein the yeast gene is selected from 
the group consistmg of NORF N« 1, 2, 4, 5, 6, 17, 25, and 27. 

16. The method of claim 1 1 \\4ierein the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, andPYKl. 

17. The method of claim 1 1 wherein the yeast gene is selected from 
the group consisting of: YKL056C, YMR116C, YEL033W, 
YOR182C, YCR013C, and YJR085C. 

18. A method for screening candidate antifungal dmgs, comprising 
the steps of: 

contacting a test substance with a yeast cell; 

monitoring expression of a yeast gene which is involved in cell 
cycle progresaon selected from the group of yeast genes identified in Tables 
1, 2, 3, and 4, wherein a test substance which modifies the expression of the 
yeast gene is a candidate antifungal drug. 

19. The method of claim 18 wherein the yeast gene is selected from 
the group consistmg of NORF N« 1, 2, 4, 5, 6, 17, 25, and 27. 

20. The method of daim 18 wherein the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, andPYKl. 

2 1 . The method of claim 1 8 wherdn the yeast gene is selected from 
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the group consisting of: YKL056C, YMR116C, YEL033W, 

YOR182C, YCR013C, and YJR085C. 
22. A method for identifying human genes which are involved in 

cell cycle progression, comprising the steps of: 

hybridLdng a probe comprising at least 10 contiguous 
nucleotides of a yeast gene which is differentially expressed between at least 
two phases selected from the group consisting of log phase, S phase, and 
G2/M phase, wherein the yeast gene is identified in Table 1, 2, 3, or 4. 
23 The method of claim 22 wherein the yeast gene is selected from 

the group consisting of NORF N** 1, 2, 4, 5, 6, 17, 25, and 27. 

24. The mediod of daim 22 wh^-dn the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, andPYKl. 

25. The metiiod of claim 22 wherdn the yeast gene is selected from 
the group consisting of: YKL056C, YMR116C, YEL033W, 
YOR182C, YCR013C, and YJROSSC 

26. A probe for ascertaining phase in the cell cycle of a cell, 
wherein the probe comprises at least 14 contiguous nucleotides of a 
NORF gene as identified in Table 3 or 4. 

27. The probe of claim 26 wherem expression of the NORF gene 
varies by at least 10% between any two phases of the cell cycle selected 
from the group consisting of: log phase, S phase, and G2/M. 

28. The probe of claim 26 wherein expression of the NORF gene 
varies by at least 25% between any two phases of the cell cycle selected 
from the group consisting of log phase, S phase, and G2/M. 

29. The probe of clsum 26 wherein expression of the NORF gene 
varies by at least 50% between any two phases of the cell cycle selected 
from the group consisting of: log phase, S phase, and G2/M. 

30. The probe of claim 26 wherein expression of the NORF gene 
varies by at least 100% between any two phases of the cell cycle 
selected from the group consisting of: log phase, S phase, and G2/M. 
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3 1 . The probe of claim 26 wherdn the NORF gene is not expressed 
in at least one phase of the cell cycle selected from the group consisting 
of: log phase, S phase, and G2/M, 

32. The probe of claim 26 wherein expression of the NORF gene 
varies by a statistically significant difference (greater than 95% 
confidence level) between any two phases of the cell cycle selected 
from the group consisting of: log phase, S phase, and G2/M. 

33^ The probe of claim 32 wherein the gene is selected from the 

group consisting of NORF N«» 1, 2, 4, 5, 6, 17, 25, and 27. 

34. The method of claim 18 wherein said step of monitoring 
expression is performed using nucleic acid molecules which are 
immobilized on a solid support. 

35. The method of claim 34 wherein the nucleic acid molecules are 
in on array. 

36. The method of claim 19 wherein a probe which comprises a 
portion of said yeast gene is in an array on a solid support 

37. An array of probes on a solid support wherein at least one probe 
comprises at least 14 contiguous nucleotides of a NORF gene as 
identified in Table 3 or 4. 

38. The array of claim 37 wherein the NORF gene is selected from 
the group consistmg of NORF No. 1 2, 4, 5, 6, 17, 25, and 27. 

39. The array of claim 37 which comprises at least 100 probes of 
distinct sequence . 

40. The array of claim 37 which comprises at least 500 probes of 
distinct sequence. 

41. The array of claim 37 which comprises at least 1,000 probes 
of distinct sequence. 
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